Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
نویسندگان
چکیده
Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics.
منابع مشابه
Genetic diversity within the Iranian spiny-tailed lizards and predicting species distribution in climate change conditions
There are different methods to investigate the effects of climatic fluctuations on the biota, two of which, molecular phylogeography and SDM, are the most useful tools to trace the past climate induced modifications on species’ geographic distributions. In this study, seven samples were collected from the species distribution range in Iran for the purpose of measuring the genetic variation with...
متن کاملPortfolio Diversification under Local and Moderate Deviations from Power Laws
This paper analyzes portfolio diversification for nonlinear transformations of heavy-tailed risks. It is shown that diversification of a portfolio of convex functions of heavy-tailed risks increases the portfolio’s riskiness, if expectations of these risks are infinite. On the contrary, for concave functions of heavy-tailed risks with finite expectations, the stylized fact that diversification ...
متن کاملنمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر
Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...
متن کاملOn Bivariate Generalized Exponential-Power Series Class of Distributions
In this paper, we introduce a new class of bivariate distributions by compounding the bivariate generalized exponential and power-series distributions. This new class contains the bivariate generalized exponential-Poisson, bivariate generalized exponential-logarithmic, bivariate generalized exponential-binomial and bivariate generalized exponential-negative binomial distributions as specia...
متن کاملCS 599 : Structure and Dynamics of Networked Information ( Spring 2005 ) 03 / 07 / 2005 : Power Law Degree Distributions
The existence of power law distributions (also known as heavy-tailed distributions) in various natural and man-made scenarios has been demonstrated empirically over the years [6], and attracted a great deal of interest, resulting in models that would naturally predict such distributions. The areas in which power laws have been observed are very diverse, as evidenced by the following, not nearly...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 11 شماره
صفحات -
تاریخ انتشار 2015